Goto

Collaborating Authors

 interlayer smoothness



Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Neural Information Processing Systems

Mode connectivity ( Garipov et al. , 2018 ; Draxler et al. , 2018 ) is a surprising In other words, the optima are not walled off in separate valleys as hitherto believed. Mode connectivity begs for theoretical explanation.



Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets

Kuditipudi, Rohith, Wang, Xiang, Lee, Holden, Zhang, Yi, Li, Zhiyuan, Hu, Wei, Arora, Sanjeev, Ge, Rong

arXiv.org Machine Learning

Efforts to understand how and why deep learning works have led to a focus on the optimization landscape of training loss. Since optimization to near-zero training loss occurs for many choices of random initialization, it is clear that the landscape contains many global optima (or near-optima). However, the loss can become quite high when interpolating between found optima, suggesting that these optima occur at the bottom of "valleys" surrounded on all sides by high walls. Therefore the phenomenon of mode connectivity (Garipov et al., 2018; Draxler et al., 2018) came as a surprise: optima (at least the ones discovered by gradient-based optimization) are connected by simple paths in the parameter space, on which the loss function is almost constant. In other words, the optima are not walled off in separate valleys as hitherto believed.